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Abstract 

Huffman coding is a widely used method for lossless data compression because it optimally stores 
data based on how often the characters occur in Huffman trees. An n-ary Huffman tree is a connected, 
cycle-lacking graph where each vertex can have either n "children" vertices connecting to it, or children. 
Vertices with children are called leaves. We let h„(q) represent the total number of n-ary Huffman trees 
with q leaves. In this paper, we use a recursive method to generate upper and lower bounds on h n (q) 
and get h 2 (q) ~ (0.1418532)(1.7941471) 9 + (0.0612410)(1.2795491) g for n = 2. This matches the best 
results achieved by Elsholtz, Heuberger, and Prodinger in August 2011. Our approach reveals patterns 
in Huffman trees that we used in our analysis of the Binary- Ternary (BT) trees we created. Our research 
opens a completely new door in data compression by extending the study of Huffman trees to BT trees. 
Our study of BT trees paves the way for designing data-specific trees, minimizing possible wasted storage 
space from Huffman coding. We prove a recursive formula for the number of BT trees with q leaves. 
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Furthermore, we provide analysis and further proofs to reach numeric bounds. Our discoveries have 
broad applications in computer data compression. These results also improve graphical representations 
of protein sequences that facilitate in-depth genome analysis used in researching evolutionary patterns. 

1 Introduction 

1.1 Background Information 

Huffman trees originated in 1952, when David A. Huffman devised an algorithm for lossless data compression 
that produces an optimal, prefix-free replacement code that is represented by a Huffman tree [1]. Huffman 
coding is "optimal" in the sense that it minimizes the expected word length. Additionally, the coding is 
"prefix-free" in that no code word forms the prefix of any other code words. Because of the significance 
of Huffman coding, Huffman trees and their patterns have been extensively analyzed. The enumeration of 
Huffman sequences is frequently studied by researchers. 

1.2 Applications 

Huffman trees have applications in a plethora of diverse fields. Most significantly, Huffman coding is a 
commonly used method of data compression that serves as the crucial final step in compressing MP3 files, 
MPEG files, and JPG images to minimize the file size. Furthermore, the Huffman algorithm is a key 
component of the DEFLATE algorithm, which compresses ZIP files and PNG images. Additionally, the 
Huffman coding method implicates advances in biochemical fields. In a method formulated in [2], a binary 
Huffman tree is used for coding amino acids based on their frequencies. The amino acid codes are then used 
to construct a lossless, 2-D graphical representation of proteins, which was never possible in the past. 

Our research facilitates further opportunities to capitalize on the efficiency and simplicity of Huffman 
coding by computing all possible sets of codes for a fixed number of characters. Furthermore, our research on 
BT trees opens a whole new door in data compression by providing a new data structure with the potential 
to resolve the existing setbacks of Huffman trees. Given a long alphabet, the conventional binary Huffman 
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tree generates codes of excessive length. In higher-order Huffman trees, it is common for there to be leftover 
leaves that are don't represent anything, which results in wasted storage space. Our results provide an 
important step toward solving these issues by introducing the idea of constructing trees based on the specific 
type of data that is being compressed. 

2 Definitions and Previous Work 
2.1 Definitions 

Before presenting our research, we need to define several key terms, functions, and sets. A n-ary Huffman 
tree has one vertex on its lowest level called the root that has n children located on the next level (level 1). 
Every other vertex can have either n or children. Vertices that don't have children are called leaves. All 
other vertices have a degree of n + 1 and are called internal vertices. 




Figure 2.1.1. Example of a binary Huffman tree. 

Figure 2.1.1 illustrates a binary Huffman tree with 6 leaves, two on level 2 and four on level 3. The single 
vertex on level is the root. The total number of leaves in a full n-ary tree is denoted by q. The number 
of internal vertices on the second highest level, or the number of parents that have children on the highest 
level, is denoted by s. For example, in Figure 2.1.1, s is equal to 2. Thus ns is the total number leaves on 
the top level, which we call p. A path from one vertex to another is made up of existing edges and vertices 
and can only bypass each level once. The length of such a path is the number of edges that it is composed 
of. 
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Figure 2.1.2. The blue edges and vertices indicate an example of a root-to-leaf path of length 3. 

An n-ary Huffman sequence of length q is defined as a non-decreasing list of the root-to-leaf path lengths 
for each of the q leaves in a full n-ary tree. Thus the Huffman sequence for Figure 2.1.2 would be {2, 2, 3, 
3, 3, 3}. We define h n (q) to be the total number of n-ary Huffman sequences of length q. 

Two trees are Huffman- equivalent if their Huffman sequences are identical. For example, since the 
Huffman sequences of the trees in Figures 2.1.1 and 2.1.2 are both {2, 2, 3, 3, 3, 3}, we count these two trees 
as one. T n (p, q) is the set of Huffman-equivalence classes of full n-ary trees with q total leaves, p of which 
are on the top level. t n (p,q) is equal to \T n (p,q)\. Thus h n {q) equivalently represents the total size of all 
possible equivalence classes of n-ary Huffman trees with q leaves. This is represented as h n (q) = t n (p, q) 
because the right hand side encompasses all possible values of p. 

We let k be the number of vertices with degree n + 1 in a full n-ary tree. The equation q — n + k(n — 1) 
can be easily reached by noting that the sum of the degrees of all vertices is double the number of edges, and 
that the number of edges is one less than the total number of vertices. H n {k) also represents the number of 
n-ary Huffman sequences. 

2.2 Previous Work 

Prior to and during our research, we closely studied some recent work done on enumerating Huffman trees. 
These papers helped us gain insight on discovering new patterns to improve the current enumeration tech- 
niques used for studying Huffman sequences. Furthermore, they gave us inspiration about approaches to use 
when taking the challenge of enumerating Binary- Ternary trees. 
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2.2.1 Paschke, Bukert, and Fehribach's Work [4] 
Theorem 2.1. t n {p, 0) — t n {ns, q + (n — l)s). 

Proof: Paschke, Bukert, and Fehribach illustrated a bijection between T n (p, q) and T n (ns,q+(n— l)s). 
They showed a one-to-one correspondence by pointing out that for a tree T such that T G I) T n (p, g) , n 
children can be given to s of the p leaves on the highest level, forming an element of T n (ns, q + (n — l)s). 
For example, one of the four leaves on the highest level are chosen and given two children as shown in Figure 
2.2.1. 

They then showed that removing the ns leaves from the highest level of T" such that T' £ T n (ns, q + (n — 
l)s) results in an element of T n (p,q). In Figure 2.2.1., going in reverse, the two leaves on the top level 
of the tree on the right are removed and we are back to an element of the original set of trees. This inverse 
operation proves that the mapping between the sets is onto as well. Therefore, the sets are equal in size. 




Figure 2.2.1. An illustration of Theorem 2.1 



n+1 

Theorem 2.2. If n > 2 and k > n + 1 then H n (k) < H n (k — i), and if k > n + 3 then H n (k) > 

»=i 

n 

^H n {k-i) + H n {h-(n + 2)). 

i=l 



Using applications of Theorem 2.1, the equation q — n + k(n — 1) and H n (k) as defined previously, Panschke, 
Bukert, and Fehribach were able to reach these results, which led to the inspiration of the method we use to 
compute upper and lower bounds on the number of n-ary Huffman sequences. 
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2.2.2 Elsholtz, Heuberger, and Prodinger's Work [5] 

In 2011, Elsholtz, Heuberger, and Prodinger were able to prove a very accurate asymptotic result on h n (q) 
by using generating functions and the asymptotic growth of h n (q). Their method is able to compute ap- 
proximations for R, i?2, ci, C2 such that h n (q) « c\R\ + c 2 R\ for any value of n. 

2.3 New Results 

In this paper, we provide and prove a simpler, recursive method that reaches the approximation of h n (q) 
in [5] and also exposes new patterns in Huffman trees. We then exploit these patterns in our analysis of 
Binary- Ternary (BT) trees and prove a recursive formula for the number of BT trees with q leaves. Finally, 
we offer insight as to how numerical bounds for BT trees can be reached. 

3 Bounds on the Number of n-ary Huffman Sequences 

3.1 Introduction 

In this section we prove a recursive method of generating upper and lower bounds for h n (q) that converge to 
the approximation reached by Elsholtz et al. Our recursive method reveals new patterns in Huffman trees 
and provides a strong base for the study of more complex trees that can better compress data. 

3.2 Proof of our Results 

Theorem 3.1 t n (ns, q) — h n (q — s(n — 1)) — ^ t n (nj, q — s(n — 1)). 

Proof: It follows from Theorem 2.1 that t n (ns, q) = h n (q — s(n — 1)) — t n (p, q — (n — l)s). To figure 

p<s 

out what to subtract from h n (q — s(n — 1)), we need to compare values of s to possible values of p. Since the 
smallest value of p is always n, for all cases of s = 1, 2, n, it is true that t n (p, q— (n — l)s) = h n (q— s(n— 1)). 
Likewise, for the next n terms, namely s = n + 1, n + 2, 2n, the case where p — n is going to be smaller 
than these cases of s. Thus for the second group of s values, it follows that 

t n (p, q—(n— l)s) = h n (q— s(n — 1)) — t n (n, q — (n — l)s). To generalize this pattern, we find that the general 
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formula for every n terms is the same, which suggests dividing by n and taking the floor function. However, 
this causes problems when s = n and we get [^J = 1 when we want it to be 0. To fix this, we subtract one 
before we divide, and obtain the desired result. 

From the results in Theorem 3.1, we can create charts where every t n (ns, q) term is represented by the sum 
of h n (q — i) terms. 

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

t(2,q) 1 
t(4,q) 1 
£(6,9) 
t(8,q) 

£(10,9) 1 -1 -1 

*(12,<Z) 1 -1 -1 

t(U,q) 

*(16,g) 

Table 3.1. This is a table that represents t2(p, q) as h^{q — i) terms. 

Each term in the table can be generated recursively. For example, if the representation of t n (6,q) is 
desired, then by Theorem 3.1 we know that [^"J = 1 casc °f s > namely s = 1, must be subtracted from 
^2(9 — 3). According to the chart, £2(2,9) = h 2 (q — 1), so £2(2,9 — 3) = h 2 (q — 3 — 1) = h 2 (q — 4). 

3.3 The Bounds. 

Our bounds are generated based upon a given integer i, meaning that only the cases where s — 1, 2, i in 
t n {ns,q) will be used in the computation of the bounds. Using such a variable greatly increases efficiency 
by allowing the bounds of any desired specificy to be computed. For example, if the value /i3(15) is desired, 
there is no need to compute a recursive chart beyond the fourth row, the row representing £3(3(4), 15), 
because the rest of the rows represent degenerate terms. 

It is important to note that there are restrictions on i. For example, there is no ternary tree that 
has 8 leaves. Here we note the common relationship between q, k, and n noted in Theorem 2.2. Since 
q = n + k(n — 1) can be alternatively expressed as q = (k + l)(n — 1) + 1, it follows that 9 = 1 (mod n — 1). 
Since we also want 9 — i = 1 (mod n — 1) to be true, n—1 must divide i. 
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3.3.1 The Lower Bound 

i 

We know that h n (q) = ^^t ra (ns, q) + ^ t n (ns,q), for some integer i. Since ^ t n {ns,q) must be 

s=l s>i+l s>i+l 

i 

nonnegative because it counts the number of trees that satisfy a certain condition, ^^t n (ns, q) < h n (q). 

s=l 

To compute the lower bound, we add up the first i rows in the chart to get a recursive formula for h n (q). 
Then, we use the coefficients to determine a characteristic equation that we solve to get the explicit formula 
for a lower bound. For higher values of q, in order to make the computation easier, we design a program 
that computes a list of coefficients for a given integer i where the kth coefficient corresponds to h n (q — k). 

We first create an array representing the values of t n (p,q), called t n (sn,q)\\ for any given p. Then we set 

fc-i 

t n (sp, q)[s] = 1, and for every t n (sp,q)[k], where k > n, we let t n (sp,q)[k] = — t n (sp, q)[i]. We then 

*=i 

fc-i 

create another array, Cof[], such that Cof[k]=y~^ t n (sp, q) [i). This final array then represents the recursive 

i=l 

coefficients for a lower bound on h n (q). 

3.3.2 The Upper Bound 

Because 

i 

K{q -i-(n-l))= t n (n, q - i - (n - 1)) + ^ t n (p, q-i-(n-l))+ t n (p, q-i - (n- 1 )), it 

p=2n p >j_|_(„_i) 

follows that 

K{q - i - (n - 1)) > t n (n,q - i - (n - 1)) + t„(p, g - i - (n - 1)). (f) 

p>i+(n— 1) 

By the same logic, /i„(q — i — 2(n — 1)) > t n (n, q — i — 2(n — 1)) + q — i — 2(n — f )). Since we 

p>j+2(n-l) 

know by Theorem 3.1 that t„(n, g — i — (n — 1)) = h n (q — i — 2(n — 1)), substituting this into (1) yields that 



h n (q-i-(n-l))> \t n (n,q-i-2(n-l))+ ^ t n (p,q - i - 2(n - 1)) 

\ p>i+2(n-l) 

+ X! tn(p,q-i -(«-!))• 

p>i+(n — 1) 

The second and third terms on the right hand side arc in the summation h n (q) = t n (ns, q). We can 

S>1 

continually generate equations of the form h n (q — k) > t n (n, q — k) + t n (p, q — k) for all values of k > i + 

p>k 



Then, after substituting all these equations into (1) , the result is 

h n (q-i-l)> t n (p,q-i- (n-1)) 

p>i+(n— 1) 

+ Yl t n (p,«-t-2(n-l)) 

p>i+1(n-\) 

+ Y t n (p,q-i-3(n-l)) 

p>i+3(n-l) 

+ t n (p,q-i-4(n-l)) 

p>i+A(n— 1) 

This shows that /i„(g — « — (n — 1)) is greater than all the terms that the lower bound neglects to account 

i 

for. Thus t n (ns, q) + h(q — i — (n — 1)) > h n (q). To compute this bound, the first term in the (i + l)st 

s=l 

row of the chart is added to the lower bound. Alternatively, 1 can be added to the (i + (n — l))th term in 
the list of coefficients for the lower bound. 

The difference between the bounds is h n (q — i — (n — 1)). The value of h n (q — i — (n — 1)) will become smaller 
and eventually reach zero as i approaches q, showing that the upper and lower bounds converge. Knowing 
this, we can reach an approximation of h n (q) by solving the characteristic equation to get the values of r 
such that h n (q) w c\r\ + c 2 r\ + .... To solve for the Ck constants, we find the value of h n (q) for a large 
value of q and then plug in arbitrarily large values of i. Realistically, however, the Ck is extremely difficult 
to calculate beyond k — 2, the result that Elsholtz et al. reached. 

For example, when we use this method to calculate the numeric bounds for binary trees, we first solve 
the characteristic equation for the two largest roots. We then compute /i2(<?) for arbitarily large values of q 
to determine the value of the c\ and c 2 . Our result is 

h 2 {q) w (0.f4f8532)(f.794f47f8754f)«+ (0.06f24f04f)(f.279549f34f 242096) 9 

This approximation is the same that was reached by Elsholtz et al. [5]. Similar steps can be taken for higher 
values of n. 
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4 Analysis of Binary- Ternary trees 
4.1 Introduction 

In order to solve some of the issues that Huffman trees pose, such as the potential wasted storage space, we 
decided to create a new data structure. In this type of tree, the number of children a parent can have varies 
from level to level. We study trees with vertices that can have either 0, 2, or 3 children, depending on the 
level that the vertex is located on. We call these trees Binary- Ternary trees. Our first area of study focuses 
on trees where the number of children each vertex can have depends on the parity of its level. In other 
words, vertices on even levels, including level 0, can have two or zero children, while vertices on odd level 
can have either three or zero children. We refer to these trees specifically as 2-3 trees, as shown in Figure 
4.1. 




Figure 4.1 This is an example of a 2,3 tree. 

Because the parity of the level determines how many children every vertex on that level can have, we 
break up the 2, 3 trees into two categories: those with an odd top level and those with an even top level. 
The number of 2, 3 trees with q total leaves and an even or odd top level is represented by e2^(q) or 02,3(9), 
respectively. The number of 2, 3 trees with p out of q total leaves on the even top level is denoted by 
t2,3{p,Qe)i whereas if the highest level is odd, the quantity is represented by t2,3(p,q )- T2^(p,q) is the set 
of equivalence classes of 2, 3 trees that have p leaves on the highest level. 

We are the first people to study BT trees as far as we know. In this section, we prove a recursive formula 
for the number of 2, 3 trees with q leaves. Furthermore, we analyze and provide further proofs as to how 
numerical bounds of BT trees can be reached. 
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4.2 The Proof 

Theorem 4.1.1 t 2 , 3 (2s, q a ) = (q - s) e - ^ t 2}3 (p, (q - s) e ) 

p=i 

Proof: When building a q tree, we think of it as starting out with a (q — s) e tree with p leaves on its 
highest level, where p > s, and then giving two children to s of the p top-level leaves. This results in a tree 
with q leaves and an odd top level. 




Figure 4.1.1. Blue vertices represent a (q — s) e tree. Adding 2s leaves raises the total number of leaves to q. 

We can compute the number of q trees by taking the number of possible (q — s) e trees and subtracting cases 
where p < s. Since the p value that s is being compared to represents the number of vertices on the top 
level of a q tree, p is always a multiple of 3. This means that the greatest p value that s = 3j — 2, 3j — 1, 3j 
are all going to be greater than is the same. Thus the same number of terms are going to be subtracted for 
every triplet of s values in that form. We can then write the number of cases of t n (p, (q — s) e ) that need to 
be subtracted as [^-\ because plugging in the values 3j — 2, 3j — 1, 3j all result in the same number. 

Theorem 4.1.2 t 2 ,a(3s, q e ) = (q - 2s) a - ^ t 2>3 (p, (q - 2s) a ) 

P =i 

Proof: Similar to our method of building a q D tree in Theorem 4.1.1, we consider constructing a q e by 
starting out with a (q — 2s) a tree with p leaves on its highest level, where p > s. We start with such a tree 
because adding three children onto s of the top-level leaves causes a net increase of 2s leaves, which would 
give us a total of q leaves. 
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Figure A. 1.2. Blue vertices represent a (q — 2s) e tree. This time, three children are given to s leaves. 

We can thus compute the number of q e trees by starting with the total number of (q — 2s) trees and 
subtracting the cases where p < s. In this computation, s is being compared to p, the number of leaves on 
the bottom level of a q e tree, which means that p has to be even. Then by the same logic in Theorem 4.1.1, 
when s = 2j — 1, 2j, the number of cases of p that need to be subtracted is same. We then find an expression 
that yields equal results when 2j — 1 and 2j are plugged in, which is [^-\ ■ 

4.2.1 The Recursive Formula. 

Using Theorems 4.1.1 and 4.1.2, we can express the values of q and q e as previous {q — i) and (q — i) e 
terms and observe the patterns noted in these theorems. For example, the representation of q is shown below. 
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We again use the technique of computing bounds up to a certain term t2,3(ni, q), where n — 2 when the 
tree has an odd highest level and n = 3 when the tree has an even highest level, for a given integer, i. We 
can express q as several summations: 

i i-4 i-7 



5> - fc) e - J2(i e - 1) - 11 - *)<>•■• 

fe=l t=0 t=0 

i-10 i-16 

+ (« - 17 - v )e + ^ - 28 - v) e ... 



v=0 v=0 



Putting these together yields 

* 3 n-3r-l L 6 J n-6u>-4 

go = £(g - k) e - J2 E (« - 5r - 1 - t)„ + J] £ (9 - - 6 - «) e 

1 1 -i t — O i v — o 

k=l r=l w=l 



In a similar method, it can be reached that 

* ,_2m-l L 1 ^] , 

q e = Hq - 2fc) G - E % - 5m - 2 - 2/) e + ^ " £ ' h(q - 5b - 19 - 2c) . 

fc=l m=l / =0 
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The characteristic root method for finding an explicit formula for a recursion is based the fact that all 
terms are of the same sequence. This means that we cannot find numerical bounds on BT trees if we have 
q D written in terms of a mix of (q — i) and (q — i) e terms, and vice versa. We must find some way to express 
q in strictly (q — i) terms and q e in strictly (q — i) e terms. Theorems 4.2 and 4.3 that follow aim to take a 
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step in reaching that goal. 



Theorem 4.2 (q + 1) Q < q Q + q e 



Proof: If we want to plug q+ 1 into the current formula for computing q e , we're essentially keeping the 
lower bound but shifting the index of everything else by 1. Thus we need to modify the current formula by 
adding 1 to all the limits because (q — i) is the smallest term that we want to include in our representation. 
Then we get: 

"+1 LfJ«-3r L 7^ J n-6w-3 

(q + 1) = J2(<1 + 1 - fc )e ~ E D (« ~ 5r - *)° + E £ (g - lltu - 5 - V) e . 

7 i i t—O i v—o 

k=l r=l w = l 

We then separate the proof into three cases of q. 



Case 1: q = 1,2,4,5 (mod 6). Since |_§J = L^J and L^J = L^J, substitution yields 

/LfJ L^J N 

(<? + l)o - q a = q e - ^2(q - 5r) - ^ (q - llw - 5) Q 



V r— 1 w — 1 



It is clear that 



LfJ L^J L^J 



r— 1 r—1 to— 1 

From this, we can see that 

LfJ L^J 

E(<7 - 5r) D - £(g-ll«,- 5) G > 0. 

r—1 w— 1 

Thus, (q + 1) G - <j < g e . 

Case 2: q = (mod 6). Since LfJ ^ L^J, then LfJ = L^J + 1- Since L^J = L^J is also true, the 
(q + 1) term in this case is greater then the (q + 1) Q term in Case 1. Thus the claim holds true. 
Case 3: q = 3 (mod 6). Now that [^J = L^J + 1 an d LfJ = L^J + 1, thc (<? + l)o term is greater 
than in Case 2, and thus the claim holds true as well. 



This theorem concludes (q + 1) — (q ) < q e . When there are (q — i) e terms in the representation of q , this 
inequality could potentially be used to substitute all the (q — i) e for (q — i) Q terms to get a characteristic 
equation that would determine a lower bound. 
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Theorem 4.3 (q + 2) e < q D + q e 



Proof: If we want to plug q + 2 into the current formula for computing q e , then we need to modify the 
current formula by adding 2 to all the limits because the smallest term we want the answer to be expressed 



in terms of is q — i. The result is 

L ~T~ J L ~2~ J n -2m+l L — g— J . 

(q + 2) e = (l + 2 - 2fc )o - E E (« - 5m - 2/) e + ]T E "(« - 56 - 2c - 17) 

fe=l m=l 



Now we do the subtraction and get 

(q + 2) e -q e = (q + 2 -2[^\) o - p(q - 5 (^j) - 2/)„ + g(« - 5 (L^j) - 2c - 17%. 

Using the logic similar to that of Theorem 4.2, it is clear that 
J> - 5 (L^j) - 2/)e - J> - 5 (L^j) - 2c - 17)„ > 0. 

Additionally, we also can note that g > (q + 2 — 2[ Ii ^J) o . Setting 

« = p{Q 5 (L^j) - 2/)e - B« - 5 (L^j) - 2c - 17 )o 

So g Q > q c - a > [q + 2 - 2[ n ^-\) o - a. Thus q a > (q + 2) e - q e and the result follows. 
This theorem accomplishes something similar to Theorem 4.2. It provides an inequality that can be used to 
substitute (q — i) terms for (q — i) e terms and potentially reach a lower bound, since what we're substituting 
in is smaller than the original term. 



5 Future Research 

Through our research we devise a recursive formula that can be used to represent BT trees. In order to 
generate numeric bounds, we must be able to cancel out all the (q — i) terms in the representation of q e 
and vice versa. Theorems 4.2 and 4.3 that we provide are for this purpose. However, we believe that more 
accurate results can be achieved. Additionally, we plan to study more BT trees such as 2,2,3,3 trees, 3,2 
trees and other trees whose sequence have different combinations of 2s and 3s. For 3, 2 trees, we can use a 
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similar approach to that of a 2, 3 tree. However, for other trees (such as 2, 2, 3, 3) we will have to consider k 
cases, where k is the period of the sequence. We conjecture, and hope to prove, that if the ratio between 2s 
and 3s in a sequence are equivalent for various BT trees, then the trees will have similar or identical bounds. 

6 Conclusion 

The first portion of our results gives an algorithm for finding the upper and lower bounds of the number of 
g-length n-ary Huffman sequences with a specified accuracy. The algorithm simplifies the method of reaching 
an approximation of h n (q) that matches the currently most accurate results. The method we prove leads 
to our research in the second portion of our results. The second part opens a completely new door in the 
study of Huffman sequences by enumerating trees with vertices that may have 2 or 3 children, depending 
on the level that they are located on. We start by splitting up the Binary- Ternary trees into the cases 
based on whether the highest level is even or odd. We then began to show many patterns such a tree holds 
that are similar to patterns in n-ary Huffman trees. To the best of our knowledge, this is the first paper 
that studies BT trees. Our results have significant applications in computer data compression and in-depth 
genome analysis. 
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